A. Why Multi-Agent?

Scaling from one brain to a team

Agenda

  • A. Why Multi-Agent? — The single-agent bottleneck
  • B. Specialist Roles — Researcher, Analyst, Writer
  • C. Orchestration Patterns — Pipeline, Hub-and-Spoke, Workspace
  • D. Parallelism — Async execution for independent tasks
  • E. Quality Gates — Review loops and conflict resolution
  • F. Wrap-up — Key takeaways & lab preview

The Single-Agent Bottleneck

Your ReactAgent from Session 1 is powerful, but has limits:

  1. Context window saturation — research + analysis + writing stuffs everything into one conversation
  2. Role confusion — “research thoroughly AND write concisely” produces mediocre results at both
  3. Sequential execution — one agent can only do one thing at a time

When Multi-Agent Actually Helps

Scenario Single Agent Multi-Agent
Simple research question Perfect Overkill
Deep research + polished report Context overload by step 8 Specialists stay focused
Compare 3+ independent topics Sequential, slow Parallel research agents
Tasks needing quality gates Agent grades its own work Analyst reviews Researcher
User needs progress by stage One blob at the end Stream results per specialist

The #1 Mistake

Premature decomposition — creating 5 agents for a task that one agent handles fine. Always start with a single agent. Only split when you hit a bottleneck.

B. Specialist Roles

Same brain, different personalities

The Specialization Pattern

Each specialist is a ReactAgent with:

  • A focused system prompt that constrains its role
  • A curated tool set (researchers get search, writers get formatting)
  • Lower max_steps (specialists finish faster)
# The code is identical — only the prompt and tools differ
agent = ReactAgent(model="gpt-4o", max_steps=8, system_prompt=RESEARCHER_PROMPT)
agent.tools = registry.get_tools_by_category("research")

The Researcher

Job: Find and retrieve relevant information. Cites everything.

RESEARCHER_PROMPT = """You are a Research Specialist. Your ONLY job is
to find and retrieve relevant information. You do NOT analyze or write.

Your standards:
- Always cite your sources with URLs or document references.
- Retrieve from multiple sources to avoid single-source bias.
- If search results are thin, reformulate your query before giving up.
- Return raw findings organized by source. Do NOT editorialize."""

Tools: search, web_reader, document_retrieval

The Analyst

Job: Evaluate information, cross-reference, identify gaps.

ANALYST_PROMPT = """You are an Analysis Specialist. Your ONLY job is
to evaluate information and extract insights.

Your standards:
- Cross-reference claims across sources. Flag contradictions explicitly.
- Distinguish between facts, opinions, and speculation.
- Identify gaps: what important questions does the research NOT answer?
- Rate confidence: High / Medium / Low per claim."""

Tools: Reasoning-only (no search tools — works from provided context)

The Writer

Job: Synthesize analysis into polished, readable output.

WRITER_PROMPT = """You are a Writing Specialist. Your ONLY job is
to take analyzed research and produce a clear, well-structured document.

Your standards:
- Write for the specified audience (default: informed professional).
- Structure with clear headings, topic sentences, and transitions.
- Preserve source citations from the research phase.
- Include confidence qualifiers from the analysis."""

Tools: Formatting-only (pure generation from provided context)

C. Orchestration Patterns

Three ways to coordinate agents

Pattern 1: Pipeline (Relay Race)

Agents execute sequentially, each passing output to the next.

graph LR
    R["Researcher"] -->|findings| A["Analyst"]
    A -->|analysis| W["Writer"]
    W -->|report| O["Output"]

    style R fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style W fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style O fill:#1C355E,stroke:#00C9A7,color:white

  • Pros: Simple, predictable, easy to debug
  • Cons: Slow (no parallelism), each agent waits for the previous one
  • Use when: Tasks have clear sequential dependencies

Pattern 2: Hub-and-Spoke (Supervisor)

A central Orchestrator assigns tasks and collects results.

graph TB
    O["Orchestrator<br/>(Supervisor)"]
    O -->|assign| R1["Researcher 1"]
    O -->|assign| R2["Researcher 2"]
    O -->|assign| A["Analyst"]
    R1 -->|results| O
    R2 -->|results| O
    A -->|analysis| O

    style O fill:#1C355E,stroke:#00C9A7,color:white
    style R1 fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style R2 fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E

  • Pros: Parallel execution, orchestrator controls flow
  • Cons: Orchestrator is a single point of failure
  • Use when: Tasks have independent sub-problems

Pattern 3: Shared Workspace

Agents read from and write to a shared state.

graph TB
    R["Researcher"] -->|write| WS["Shared Workspace<br/>(entries by type)"]
    A["Analyst"] -->|read/write| WS
    W["Writer"] -->|read/write| WS
    WS -->|read| A
    WS -->|read| W

    style R fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style W fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style WS fill:#1C355E,stroke:#00C9A7,color:white

  • Pros: Flexible, agents can iterate on shared context
  • Cons: Coordination complexity — who reads what and when?
  • Use when: Tasks require iterative refinement

Our Architecture: Hybrid

We combine Hub-and-Spoke + Shared Workspace:

graph TB
    O["Orchestrator"]
    O -->|"Phase 1"| R["Researcher(s)"]
    R -->|write| WS["Shared Workspace"]
    O -->|"Phase 2"| A["Analyst"]
    WS -->|read| A
    A -->|write| WS
    O -->|"Phase 3"| W["Writer"]
    WS -->|read| W
    W -->|write| WS
    O -->|"Phase 4"| QG["Quality Gate"]

    style O fill:#1C355E,stroke:#00C9A7,color:white
    style R fill:#00C9A7,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style W fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style WS fill:#F0F4F8,stroke:#1C355E,color:#1C355E
    style QG fill:#FF7A5C,stroke:#1C355E,color:#1C355E

D. Parallelism

Running independent tasks concurrently

The Sequential Problem

# Sequential: 6+ seconds for 3 independent topics
result_1 = researcher.run("Research topic A")   # 2s
result_2 = researcher.run("Research topic B")   # 2s
result_3 = researcher.run("Research topic C")   # 2s
# Parallel: ~2 seconds for all 3 topics
results = await asyncio.gather(
    run_agent_async(researcher, "Research topic A"),
    run_agent_async(researcher, "Research topic B"),
    run_agent_async(researcher, "Research topic C"),
)

3x speedup with zero quality loss.

asyncio.gather in Practice

async def _run_agent_async(self, task: AgentTask) -> str:
    """Run a synchronous agent in a thread pool."""
    loop = asyncio.get_event_loop()
    return await loop.run_in_executor(None, self._run_agent, task)

# In the orchestrator:
if len(research_tasks) > 1:
    results = await asyncio.gather(
        *[self._run_agent_async(task) for task in research_tasks]
    )
else:
    results = [self._run_agent(research_tasks[0])]

Why run_in_executor?

Our ReactAgent.run() is synchronous (uses litellm.completion). We wrap it in a thread pool executor so it can run inside an async event loop without blocking.

When to Parallelize

Scenario Parallelize? Why
Research Topic A and Topic B Yes Independent, no shared state
Research then Analyze No Analysis depends on research results
Analyze 3 sources independently Yes Each analysis is independent
Write then Review No Review depends on the draft

Rule: Parallelize tasks that share no data dependencies.

E. Quality Gates

Review loops and conflict resolution

The Review Loop

After the Writer produces a draft, the Analyst reviews it:

graph LR
    W["Writer<br/>produces draft"] --> A["Analyst<br/>reviews draft"]
    A -->|"APPROVED"| F["Final Output"]
    A -->|"revision notes"| W

    style W fill:#FF7A5C,stroke:#1C355E,color:#1C355E
    style A fill:#9B8EC0,stroke:#1C355E,color:#1C355E
    style F fill:#00C9A7,stroke:#1C355E,color:#1C355E

The Analyst checks for:

  1. Factual accuracy against the research
  2. Missing important points
  3. Unsupported claims (low-confidence presented as fact)
  4. Structural issues

Implementing the Quality Gate

async def _quality_gate(self, query, draft, max_revisions=2):
    current_draft = draft
    for revision in range(max_revisions):
        review = self._run_agent(AgentTask(
            agent_name="analyst",
            instructions=f"Review this draft... If acceptable: APPROVED"
        ))

        if "APPROVED" in review.upper():
            return current_draft          # Draft passes!

        # Revision needed — send notes back to Writer
        current_draft = self._run_agent(AgentTask(
            agent_name="writer",
            instructions=f"Revise based on feedback:\n{review}"
        ))

    return current_draft                  # Max revisions reached

Always Set max_revisions

Without a limit, the Analyst and Writer can enter an infinite review loop. In production, 2-3 revisions is usually enough.

The Complete Workflow

sequenceDiagram
    participant O as Orchestrator
    participant R as Researcher(s)
    participant A as Analyst
    participant W as Writer

    O->>R: Phase 1: Research (parallel)
    R-->>O: Raw findings

    O->>A: Phase 2: Analyze findings
    A-->>O: Structured analysis

    O->>W: Phase 3: Write report
    W-->>O: Draft

    O->>A: Phase 4: Review draft
    A-->>O: "Revision needed"
    O->>W: Revise with feedback
    W-->>O: Revised draft
    O->>A: Review again
    A-->>O: "APPROVED"

F. Wrap-up

Key Takeaways

  1. Multi-agent is not always better — start with single agent, split at bottlenecks
  2. Specialization = prompt + tools — same ReactAgent code, different personality
  3. Three patterns: Pipeline, Hub-and-Spoke, Shared Workspace
  4. Parallelize independent tasksasyncio.gather for 3x+ speedups
  5. Quality gates prevent bad output — but always set a max_revisions limit

Lab Preview: The Newsroom

Step 1: The Specialists

  • Build Researcher, Analyst, Writer
  • Configure system prompts and tools

Step 2: The Orchestrator

  • Implement MultiAgentOrchestrator
  • Parallel research with asyncio.gather

Step 3: The Quality Gate

  • Add the review loop
  • Set max_revisions limit
  • Test with a comparison query

Time: 75 minutes

Questions?

Session 2 Complete